Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up dense/sparse vector stats #111729

Merged
merged 4 commits into from
Aug 12, 2024
Merged

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Aug 9, 2024

This change ensures that we don't try to compute stats on mappings that don't have dense or sparse vector fields. We don't need to go through all the fields on every segment, instead we can extract the vector fields upfront and limit the work to only indices that define these types.
This PR is marked as a performance bug since deployments with lots of fields/segments are impacted when performing index stats even if they don't define a sparse/dense vector field.

Closes #111715

This change ensures that we don't try to compute stats on mappings that don't have dense or sparse vector fields. We don't need to go through all the fields on every segment, instead we can extract the vector fields upfront and limit the work to only indices that define these types.

Closes elastic#111715
@jimczi jimczi requested a review from kderusso August 9, 2024 00:03
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 9, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change so quickly!

@jimczi jimczi merged commit 59cf661 into elastic:main Aug 12, 2024
15 checks passed
@jimczi jimczi deleted the vector_stats_optim branch August 12, 2024 00:02
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.15

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Aug 12, 2024
This change ensures that we don't try to compute stats on mappings that don't have dense or sparse vector fields. We don't need to go through all the fields on every segment, instead we can extract the vector fields upfront and limit the work to only indices that define these types.

Closes elastic#111715
elasticsearchmachine pushed a commit that referenced this pull request Aug 12, 2024
This change ensures that we don't try to compute stats on mappings that don't have dense or sparse vector fields. We don't need to go through all the fields on every segment, instead we can extract the vector fields upfront and limit the work to only indices that define these types.

Closes #111715
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Sep 4, 2024
This change ensures that we don't try to compute stats on mappings that don't have dense or sparse vector fields. We don't need to go through all the fields on every segment, instead we can extract the vector fields upfront and limit the work to only indices that define these types.

Closes elastic#111715
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Sep 5, 2024
This change ensures that we don't try to compute stats on mappings that don't have dense or sparse vector fields. We don't need to go through all the fields on every segment, instead we can extract the vector fields upfront and limit the work to only indices that define these types.

Closes elastic#111715
dnhatn added a commit that referenced this pull request Sep 10, 2024
If a segment doesn't contain any documents with a dense_vector field, 
but the mapping defines it, an NPE can occur when retrieving the
dense_vector stats.

Relates #111729
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Sep 10, 2024
If a segment doesn't contain any documents with a dense_vector field, 
but the mapping defines it, an NPE can occur when retrieving the
dense_vector stats.

Relates elastic#111729
elasticsearchmachine pushed a commit that referenced this pull request Sep 10, 2024
If a segment doesn't contain any documents with a dense_vector field, 
but the mapping defines it, an NPE can occur when retrieving the
dense_vector stats.

Relates #111729
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.15.1 v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Engine#getSparseVectorValueCount seems rather expensive
3 participants